Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

de: consider local name only for namespaced tags in structs with $value #736

Closed
wants to merge 2 commits into from

Conversation

Xiphoseer
Copy link
Contributor

This updates the "not_in" check that decides whether to pass a new start tag within a struct to a $value field, to only consider the local part of a QName. It now uses the same decode_name function as the QNameDeserializer that is used for keys/fields already to ensure they stay in sync.

Using the namespaced name in serde (i.e. #[serde(rename = "ns1:tag")]) fails with Custom("missing field `ns1:tag`") today, so this will not break existing code.

Might help with #347

This updates the "not_in" check that decides whether to pass a
new start tag within a struct to a $value field, to only consider
the local part of a QName. It now uses the same decode_name function
as the QNameDeserializer that is used for keys/fields already to
ensure they stay in sync.

Using the namespaced name in serde (i.e. `#[serde(rename = "ns1:tag")]`)
fails with ``Custom("missing field `ns1:tag`")`` today, so this will
not break existing code.

Might help with tafia#347
@Xiphoseer Xiphoseer changed the title de: consider local_name only for namespaced tags in structs with $value de: consider local name only for namespaced tags in structs with $value Apr 12, 2024
@Xiphoseer
Copy link
Contributor Author

Xiphoseer commented Apr 12, 2024

Looks like I did not enable the "serialize" feature when running tests locally and got confused by the double negative.

I.e. not_in should say

yes, ns1:tag does not appear in [ some, namespace, ns1:tag ]

because the former is considered as tag and the latter is not yet supported

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.51%. Comparing base (29962e7) to head (745278a).
Report is 3 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #736      +/-   ##
==========================================
+ Coverage   61.48%   61.51%   +0.03%     
==========================================
  Files          38       38              
  Lines       16213    16224      +11     
==========================================
+ Hits         9968     9981      +13     
+ Misses       6245     6243       -2     
Flag Coverage Δ
unittests 61.51% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test if this actually fixes #347? Add regression test into ./tests/serde-issues.rs. Try to keep original code as much as possible to be sure that's user problem is actually solved, but instead of printing results use assertions. That test is needed only if #347 is fixed.

If your patch does not fix the error, then I would like to see a test which will demonstrate working of this fix, so it will not break in the future (you are also free to add it anyway even if regression test fit our needs).

I also interested to check how it behaves with serialization. Ideally, we should be able to deserialize serialized data. I'm not sure that this is always true now, but we strive for this. So if your patch fix that aspect, it also would be welcome to be presented in a test.

When test will demonstrate changes that was made, please add an entry to Changelog.md with a short description of what is fixed.

@@ -23,7 +23,7 @@ macro_rules! deserialize_num {
/// The method will borrow if encoding is UTF-8 compatible and `name` contains
/// only UTF-8 compatible characters (usually only ASCII characters).
#[inline]
fn decode_name<'n>(name: QName<'n>, decoder: Decoder) -> Result<Cow<'n, str>, DeError> {
pub(super) fn decode_name<'n>(name: QName<'n>, decoder: Decoder) -> Result<Cow<'n, str>, DeError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move that function to de/mod.rs instead, so it will be available in all submodules.

@Mingun Mingun added serde Issues related to mapping from Rust types to XML namespaces Issues related to namespaces support labels Apr 12, 2024
@Xiphoseer
Copy link
Contributor Author

Xiphoseer commented Apr 12, 2024

Hey, thank you for the review. My case is probably closer to #212, but that was closed as a duplicate of #347.

I agree that a regression test would be useful so I'll add one when I find time for it. This change doesn't do anything to fix serialization, but it tries to improve upon the current behavior of only considering the local part of a QName for deserialization.

My actual case is as follows:

<ns1:parent>
    <ns1:field />
    <ns1:one />
</ns1:parent>

with the following data model

#[derive(Deserialize)]
#[serde[renameAll = "camelCase")]
enum Choice {
  One,
  Two
}

#[derive(Deserialize)]
struct Field;

#[derive(Deserialize)]
#[serde[renameAll = "camelCase")]
struct Parent {
  field: Field,
  #[serde(rename = "$value")]
  choice: Vec<Choice>,
}

Today, this will try to find "field" as a variant in "Choice" because the not_in && has_value_field case triggers when deserializing the struct.

With the change in this PR, it will correctly identify "field" as a key on the struct even for "ns1:field".

Renaming the field to "ns1:field" via serde does not work, which I guess is what #347 is about: The not_in would trigger but the QNameDeserializer will pick just the local part and fail to find the field so you get "missing field" errors.

I imagine that adding a toggle in Deserializer (use local name vs full name for decode_name) could be a backwards compatible change to resolve #347.

@Mingun Mingun linked an issue Apr 29, 2024 that may be closed by this pull request
@Mingun
Copy link
Collaborator

Mingun commented Apr 29, 2024

I checked, and this PR does not solve #347, but solve #683, because it actually does the same thing as #702.

Unfortunately, #683 and #347 seems to contradict each other. #347 requires to consider prefix part of a tag. Well, it should be possible to fix both issues, but this simple fix just solve the #683 case, but makes us a little more far from solving #347 (in particular, this change makes it impossible to deal with that example).

I would prefer to investigate possibilities to solve both cases (#683 and #347).

cc @leftmostcat

@leftmostcat
Copy link
Contributor

I would prefer to investigate possibilities to solve both cases (#683 and #347).

Would it make sense to, unless the field is named $text or $value, parse the field name as an XML name and when deserializing, only compare local name to local name?

@Xiphoseer
Copy link
Contributor Author

FWIW this question sent me down a rabbit hole of "canonical" encodings of a namespaced name and/or QName into a single string in XML and whether it's appropriate to use that as a field identifier. Turns out that URI isn't the answer to that, instead the {http://example.com/namespace}element syntax seems to be established for it. If we assume the @attribute syntax is indeed inspired by XPath, it might be reasonable to explore Q{http://example.com/namespace}element from XPath 3.0 for namespace support.

Furthermore, I think it would be wrong to use a serde field name of ns1:element unless going for a pure, pre https://www.w3.org/TR/xml-names/ tag/attribute name handling mode. QNames are defined to be equivalent if their namespace and local name match, so attaching meaning to a specific prefix string value via making it part of the serde property would prevent its use for certain well-formed documents.

In my mind, #347 is more about making sure the roundtrip works without different configuration for serialization and deserialization, regardless of any particular style of #[serde(...)] attributes. Suggesting that it's valid to serialize with an arbitrary namespace prefix by annotating the field with a literal prefix ns1: not tied to some xmlns:ns1 machinery is, in my understanding of XML Namespaces, a flawed assumption.

All that to say: After all that reading, I still think using the local part only is the correct first step toward full namespace support.

@Mingun
Copy link
Collaborator

Mingun commented May 13, 2024

Furthermore, I think it would be wrong to use a serde field name of ns1:element unless going for a pure, pre https://www.w3.org/TR/xml-names/ tag/attribute name handling mode.

Yes, I agree with that.

All that to say: After all that reading, I still think using the local part only is the correct first step toward full namespace support.

Yes, that's what I say in my previous comment.

@Xiphoseer
Copy link
Contributor Author

Furthermore, I think it would be wrong to use a serde field name of ns1:element unless going for a pure, pre https://www.w3.org/TR/xml-names/ tag/attribute name handling mode.

Yes, I agree with that.

All that to say: After all that reading, I still think using the local part only is the correct first step toward full namespace support.

Yes, that's what I say in my previous comment.

In that case, would it be possible to merge this PR as-is and do improvements to namespace handling as follow ups? Right now this bug (QNameDeserializer and not_in are inconsistent) severely reduces the usability of $value.

@Mingun
Copy link
Collaborator

Mingun commented May 13, 2024

Yes, I'll combine your PR and #702, add tests and changelog. I'll plan to release new version soon, maybe this weekend. Unfortunately too many job at work and Kreia in KotOR2 is so intriguing. Anyway, thanks for your work!

@Mingun
Copy link
Collaborator

Mingun commented May 13, 2024

Closed by #702

@Mingun Mingun closed this May 13, 2024
@Xiphoseer Xiphoseer deleted the deserialize-local-name branch May 13, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
namespaces Issues related to namespaces support serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repeated choice plus other field
4 participants